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Abstract 

Aiming at abundant scientific and engineering data with not only high dimen¬ 
sionality but also complex structure, we study the regression problem with a multi¬ 
dimensional array (tensor) response and a vector predictor. Applications include, 
among others, comparing tensor images across groups after adjusting for additional 
covariates, which is of central interest in neuroimaging analysis. We propose par¬ 
simonious tensor response regression adopting a generalized sparsity principle. It 
models all voxels of the tensor response jointly, while accounting for the inher¬ 
ent structural information among the voxels. It effectively reduces the number of 
free parameters, leading to feasible computation and improved interpretation. We 
achieve model estimation through a nascent technique called the envelope method, 
which identifies the immaterial information and focuses the estimation based upon 
the material information in the tensor response. We demonstrate that the result¬ 
ing estimator is asymptotically efficient, and it enjoys a competitive finite sample 
performance. We also illustrate the new method on two real neuroimaging studies. 
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1 Introduction 


For modern scientific data, an overarching feature that accompanies high or ultrahigh di¬ 
mensionality is the complex structure of the data. For instance, in neuroimaging studies, 
electroencephalography (EEG) measures voltage value from electrodes placed at various 
scalp locations over a period of time, and the resulting data is a two-dimensional matrix 
where the readings are both spatially and temporally correlated. Similarly, anatom¬ 
ical magnetic resonance imaging (MRI) takes the form of a three-dimensional array, 
where voxels correspond to spatial locations of the brain and are spatially correlated. 
Multi-dimensional array type data also frequently arise in chemometrics, econometrics, 
psychometrics, and many other applications. In our study, we are primarily interested in 
comparing multidimensional array, also know as tensor, under two or more conditions, 
after adjusting for other potentially confounding variables. Our motivating examples 
came from two neurological imaging studies, while the proposed methodology equally 
applies to a variety of scientihc and engineering applications as well. One example is to 
compare the EEG scans between the alcoholic subjects and the general population, and 
the second is to compare the MRI scans of brains between the subjects with attention 
dehcit hyperactivity disorder (ADHD) and the healthy controls after adjusting for age 
and gender of the subjects. This tensor comparison problem can be more generally for¬ 
mulated as a regression with the image tensor as response and the group indicator and 
other covariates as predictors. In this article, we aim to study this problem and term it 
tensor response regression. 

While there is an enormous body of literature tackling regression with high- or 
ultrahigh-dimensional predictors, there have been relatively few works on regression 
with multivariate response, and even fewer works on regression with tensor response. 
We review three major lines of related research. The hrst concerns multivariate vector 


response regression, and popular solutions include partial least squares 

Holland, 1990 

1992 Ghun and Kele§ 

201C 

), canonical correlations (Zhou and He, 

2008 

, reduced-rank 

regressions (Izenman, 

1975 

Reinsel and Vein, 1998 Yuan et ah, 2007 

), sparse regressions 

with various penalties incorporating correlated response variables 

(Simila and Tikka 
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2007 Turlach et al., 2005; Peng et al., 2010), and sparse reduced-rank regressions (Chen 


and Huang, 2012). Most of existing solutions adopt a linear association between the 


multivariate response and predictors, and they universally deal with the case where the 
response variables are organized in the form of a vector. Our goal, however, is to model 
a tensor response, where the vector response can be viewed as a special case of a one¬ 
dimensional tensor. The second line of research directly models association between 
an image tensor and a vector of predictors in the context of brain imaging analysis. 
The dominating solution in the held regresses one response variable (voxel) at a time 


(Friston et ah, 2007), and thus completely ignores underlying correlations among the 
voxels (Li et ah, 2011). Li et al. (2011) and its follow-up works (Skup et al. [M^ pl 


et al., 2013) proposed a multiscale adaptive approach to smooth imaging response and 


to estimate parameters by building iteratively increasing neighbors around each voxel 
and combining observations within the neighbors with weights. Our approach differs 
in that we aim to model all the voxels in an image tensor jointly while incorporating 
the intrinsic spatial correlations among the voxels. Finally, there have been some recent 


developments regressing a scalar response on a tensor predictor (Reiss and Ogden, 2010 


Zhou et al., 2013 [^ou and Llf 2014 Goldsmith et al., 2014 Wang et ah, 2014). By 


contrast, our proposal reverses the role by treating the image tensor as response and the 
vector of covariates as predictors. The two treatments yield different interpretations. 
The former, the tensor predictor regression, focuses on understanding the change of a 
clinical outcome as the tensor image varies, so may be used for disease diagnosis and 
prognosis given image patterns. The latter, the tensor response regression, aims to study 
the change of the image as the predictors such as the disease status and age vary, and 
thus offers a more direct solution if the scientihc interest is to identify brain regions 
exhibiting different activity patterns across different groups of subjects. In addition, the 
technique proposed in this article is completely different from the techniques used in 
tensor predictor regression, and as we will later show in the simulations, tensor response 
regression exhibits a more competitive hnite sample performance compared to tensor 
predictor regression when the sample size is small. 

In this article, we propose a parsimonious tensor response regression model and 
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develop a novel estimation approach. Specifically, we continue to impose a linear asso¬ 
ciation between the tensor response and the predictors. Meanwhile, we adopt a form 
of sparsity principle by assuming that part of the tensor response does not depend on 
the predictors and does not affect the rest of the response either. Adopting this prin¬ 
ciple effectively reduces the number of free parameters, leads to a parsimonious model 
with improved interpretation, and yields a coefficient estimator that is asymptotically 
efficient. This principle can finds its natural counterpart in the sparsity principle in re¬ 
gression and variable selection with high-dimensional predictors, where only a subset of 
variables are believed to be relevant to the response. However, our proposal significantly 
differs from the popular sparse model estimation and selection in several ways. While 
the usual sparsity principle frequently adopted in variable selection assumes a subset of 
individual variables are irrelevant, we assume that the linear combinations are irrelevant 
to regression. Rather than using Li type penalty functions to induce sparsity, as is often 
done in variable selection, we employ a nascent technique called the envelope method 
(Cook et ah, [2010 ) to estimate the unknown regression coefficient. Moreover, whereas 
most sparse modeling treats variable selection and parameter estimation separately, our 
envelope method essentially identifies and utilizes the material information in a joint 
estimation manner. We develop a fast estimation algorithm and study the asymptotic 
properties of the estimator. We demonstrate through both simulations and real data 
analyses that the new estimator improves dramatically over some alternative solutions. 

The contributions of this article are multi-fold. First, it addresses a family of ques¬ 
tions of substantial scientific interest but with relatively few solutions. Our proposal 
offers a systematic approach to jointly model all elements of a tensor response given 
a vector of predictors. A particular application is to compare tensor images across 
groups adjusting for other covariates, which is of central interest in neuroimaging analy¬ 
sis. Second, while existing regularization solutions largely rely on penalty functions, our 
envelope based method provides an alternative way of introducing regularization into 
estimation. It complements the usual penalty function based solutions, and usefully 
expands the realm of regularized estimation in general. Moreover, our method can be 
naturally coupled with penalty functions for further regularization. Third, our work 
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advances the recent development of envelope method that was hrst proposed by Cook 


et al. 

(2010 

) then further developed in a series of papers ( 

Su and Cook 

2011 

2012 

2013 


Cook et al., 2013, 2014 Cook and Zhang, 2014c|[b ). Whilst all existing envelope meth¬ 
ods concentrate on a scalar or vector response, our work differs obviously by tackling a 
tensor response. Such an extension is far from trivial, and new techniques are required 
throughout its development, even though to make our proposal easier to comprehend, 
we have chosen to present our method in a way that is parallel to that for a vector 
response. Furthermore, since the envelope methodology is new and sometimes uneasy 
to follow, we strive to connect it with the more familiar sparsity principle and clearly 
outline its assumptions, gains and limitations. 

The rest of the article is organized as follows. Section [previews key tensor notations 
and operations, and summarizes the envelope method for multivariate vector response 
regression. Section proposes tensor response linear model, the generalized sparsity 
principle, then the concept of tensor envelope. Section develops two estimators, and 
Section studies their asymptotic properties. Simulations and real data analyses are 
presented in Sections and followed by a discussion in Section All technical proofs 
are relegated to the Supplementary Materials. 


2 Preparations 

2.1 Tensor notations and operations 

We begin with a quick review of some tensor notations and operations that are to be 


intensively used in this article. See also Kolda and Bader (2009) for an excellent review. 

Multidimensional array A G is called an mth-order tensor. The order of 

a tensor is also known as dimension, way or mode. A fiber is the higher order analogue 
of matrix row and column, and is dehned by hxing every index of the tensor but one. A 
matrix column is a mode-1 hber and a row is a mode-2 hber. 

The vec(A) operator stacks the entries of a tensor into a column vector, so that 
an entry of A maps to the j-th entry of vec(A), in which j = 1 -|- — 

l)nfc'A\rfc/. The mode-k matricization, A(k), maps a tensor A into a matrix, denoted 
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by G so that the (ii,..., im) element of A maps to the {ik,j) element 

of the matrix A(fc), where j = 1 + - 1) ]\k"<k',k”¥^k^k"- The k-mode product 

of a tensor A and a matrix C G results in an mth-order tensor denoted as 

A XkC E x---xrfc_jxsxrj;+jx---xr™^ where each element is the product of mode-/c hber 
of A multiplied by C. Similarly, the k-mode vector product of a tensor A and a vector 
c G IR^*^ results in an (m — l)th-order tensor denoted as Ax^ c G ^ 

where each element is the inner product of each mode-fc hber of A with the vector c. 

The Tucker decomposition of a tensor is dehned as A = G xi X 2 ■ ■ ■ Xm 
where C G ]R“j is the core tensor, and G 1R^'=^“'=, k = 1,... ,m, are the factor 
matrices. It is a low-rank decomposition of the original tensor A. For convenience, the 
Tucker decomposition is often represented by a shorthand, |C; ..., F^™'^]. 


2.2 Multivariate response envelope model 

Next we briehy review the multivariate linear model with vector-valued response, along 
with some key concepts of envelope, and with two goals in mind. First, it is to assist 
with a better understanding of the envelope methods in general, and second, to facilitate 
the construction of envelopes for tensor-valued response regression. 

We start with the classical multivariate response linear model. 


Y = l3X + e, 


( 1 ) 


where Y G IR^ is a response vector, X G IR^ is a predictor vector, (3 G IR'"^^ is the 
coefficient matrix, while the intercept is set to zero by centering the samples of Y and 
X, and e G IR'’ is the i.i.d. error that is independent of X. It is often assumed that 
£ follows a multivariate normal distribution with mean zero and covariance S G IR^^'’, 
S > 0, though normality is not essential. 


The envelope model (Cook et ah, 2010) builds upon a key assumption that some 


aspects of the response vector are stochastically constant as the predictors vary. In 
other words, part of the response is irrelevant to the regression. More rigorously, we 
assume there exists a subspace S of IR'’ such that 


QsY\X ~ QsY, QsY ^PsY\X, 


( 2 ) 
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where Ps is the projection matrix onto 5, Qs = Ir — Ps is the projection onto the 
complement space iS-*-, ~ means identically distributed, and JL means statistical in¬ 
dependence. To better understand this assumption, we introduce a basis system of S. 
Let r G IR'’^" denote a basis matrix of S, where u is the dimension of 5, u < r, and 
To G a basis of S-^. Then (|^ essentially states that the linear combinations 

G 1R'’““ are immaterial to the estimation of (3 in that it responds neither to changes 
in the predictors nor to those in the rest of the response G 1R“. Correspondingly, 
carry all the material information in Y^ and intuitively, one can focus on in 
regression modeling. 

We remark that, assumption 0 . although looks somewhat unfamiliar, can hnd its 
natural counterpart in the well known and commonly adopted sparsity principle in clas¬ 
sical variable selection, where only a subset of variables are assumed to be relevant to 
regressions. The two assumptions, at heart, share exactly the same spirit that only part 
of information is deemed useful for regressions and the rest irrelevant. However, they 
are also different in that, whereas the usual sparsity principle focuses on individual vari¬ 
ables, ([^ permits linear combination of the variables to be irrelevant. For this reason, 
we term assumption (|^ as the generalized sparsity principle. Compared to the usual 
sparsity principle, it is more flexible, but could lose some interpretability. 

To see how the generalized sparsity principle would facilitate estimation of (3 in model 
Q, we note that the following decompositions hold true under ([^. 

span(/3) C S and S = Yaj:{PsY) + var(Q 5 l^) = PsT,Ps + Q 5 SQ 5 , 

where span(/3) is the column subspace of /3, i.e. the subspace spanned by the columns of 
f3. Accordingly, we can rewrite the above decompositions in terms of the basis matrices, 

(3 = re and 1: = TnT^ + TonoTl, (3) 

where 0 = (3 G 1R“^p denotes the coordinates of f3 relative to the basis F, = 

cov(F^l^ I ^ 1R“^“ is the material variation, and fig = cov(FgW | X) = cov(FqX) G 

is the immaterial variation that contains no information about W|X, but 
only brings extraneous variation in estimation. 
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Given the first result of (|^, we note that model ([^ can be rewritten as 


T Y = 0X + re, and = Tje. 


( 4 ) 


In turn Q implies that regression modeling can now focus on the material part 
only. The effective number of parameters in model Q is reduced from pr + r(r + l)/2 
without assumption (|^, to pu + (r — u)u + u{u + l)/2 + (r — u){r — u + l)/2 with the 
assumption, and the difference is p{r — u). 

Given the second result of ([^, Gook et ah ( 2010[ ) showed the gain in estimation effi¬ 
ciency for /3. Let /3 env denote the estimator of /3 in ([^ under ([^, /3 qls the ordinary least 
squares estimator without imposing assumption ([^, and /3 true true value of /3. Then 
it is shown that, both y/n |vec(/3QLs) “ V6c(/3.j.j^ug)| and y/n |vec(/3j;j^v) — vec(/3.j,p,uE) 
converge to a normal vector with mean zero and covariance matrix. 


avar 


n vec(/3oLs) r = ® ^^cl avar y/n vec(/3j;Nv) r = ® (rfir"^) + A, 


respectively, where Sx = cov(X), the hrst term in SLvax{y/n vec(/3gj^y)} corresponds 
to the asymptotic variance of the estimator given that S is known, and the second 


term A is the asymptotic cost of estimating S. While Gook et al. (2010) showed that 


avar{-y/n vec(/3gj.jy)} < mai{y/n vec(/3QLs)} general, in the light of decomposition of 
S in (j^, it is straightforward to see that the hrst term in avar{-y/n vec(/3j;j^v)} is to 
be substantially smaller than avar{-y/n vec(/3QLs)}; if immaterial variation roLioTj 
dominates the material variation 

Finally, we address the issue of existence and uniqueness of 5 in ([^. The subspace 
S always exists, as it can trivially take the form of IR*”. However, S is not unique. Then 
the idea is to seek the intersection of all subspaces that satisfy (|^, which is minimum 


and unique. Toward that end, Gook et al. (2010) gave two generic dehnitions. 


Definition 1. A subspace TZ C IR?' is said to be a reducing subspace of M G IR^^^ ifTZ 
satisfies that M = P'jiMP'ji + 

Definition 2. Let M G IR^^^ and B C span{M). Then the M-envelope of B, denoted 
by £m{B), is the intersection of all reducing subspaces of M that contain B. 















Given those definitions, we see that any subspace S satisfying (|^ under model Q is a 
reducing subspace of S, and the intersection of all such subspaces is the S-envelope of 
B = span(/3). This envelope £-e{B) is also denoted by and uniquely exists. In our 

envelope based estimation, we seek the estimation of Ss{f3) so to improve estimation of 
the coefficient matrix /3. 


3 Models 


3.1 Tensor response linear model 

When facing a tensor response, we develop a model in analogous to the classical mul¬ 
tivariate model Q. That is, for an mth order tensor response Y G IR^^and a 
vector of predictors X G IR^, consider the tensor response linear model 


Y = Bx 


(m+l) 


X 


£. 


(5) 


In this model, the coefficient B G IR^^ x---xr™xp jg (m-|- l)th order tensor that captures 
the interrelationship between Y and X and is the parameter we are primarily interested 
in. X(m+i) is the (m -|- l)-mode vector product. The intercept term is again omitted 
without losing any generality. The error e G IR'’^ is an mth order tensor that is 

independent of X and has mean zero. Furthermore, we assume that s has a separable 
Kronecker covariance structure such that cov{vec(£)} = S = ••®Si,Sfc>0,/c = 

1,..., m. This separable covariance assumption is important to help reduce the number 
of free parameters in S, which is part of our envelope estimation. Meanwhile, this 
separable structure has been fairly commonly imposed in the tensor literature; see for 


instance, Hoff (2011); Fosdick and Hoff (2014). Here to avoid notation proliferation, we 
continue to use S to denote the covariance matrix, as it should not cause any confusion 
in the context. The distribution of vec(£) is assumed to be normal, which enables 
likelihood estimation. However, normality is not essential, and moment based estimation 
can replace likelihood estimation when the normality assumption is in question. 

Two special cases of model (|^ are worth of brief mentioning. The first is when X 
is a scalar and takes the value of only 0 or 1. In this case, B reduces to an mth order 
tensor that can be interpreted as the mean difference of the tensor coefficients between 
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the two populations. The second case is when m = 1, where the response Y becomes 
a vector, and model (|^ reduces to the classical multivariate linear model Q. In this 
case, BX(^rn+i)^ becomes the inner product of each mode-2 hber (i.e., row) of B with 
X, which in turn is the usual matrix product of B and X. 

Next we consider an alternative tensor response linear model (|^, 

vec(Y) = BJ^^^^X + vec(£). ( 6 ) 

This model can be viewed as the vectorized form of model ([^. However, the main 
difference is that no separable covariance structure is imposed on the error term e in 
(§. The coefficient matrix . 6 (^+ 1 ) € pg interpreted as the mode-(m-|- 1) 

matricization of the tensor coefficient .B in (|^. Each column of B(m+i) is a p x 1 coef- 
hcient vector that characterizes the linear relationship between each individual element 
of Y and the predictor vector X. Therefore, estimating B(m+i) in (§ is equivalent to 
estimating B in ([^ by htting individual elements of on X one at a time. We call 
this estimator the ordinary least squares estimator of B, and denote it by Bqls- Given 
the data {(X,, it has an explicit form 

where X G IR^^" and Y G x...xr™xn stacked predictor matrix and response 

array, respectively. If vec(£) is further assumed to be normally distributed, then the 
above OLS estimator is also the maximum likelihood estimator based on model ([^. 

3.2 Generalized sparsity principle 

For a tensor response, we expect a similar generalized sparsity principle like (|^ to hold 
true. It is probably more so than the vector response case, as intuitively it is reasonable 
to expect certain regions of the tensor response to be immaterial. More specihcally, 
suppose there exist a series of subspaces, Sk C 1 R'''=, A; = 1 ,..., m, such that 

Y Xfc Qfc|X ~ X XkQki Y XfcQfcJiX XkPk\X., A; = l,...,m, (7) 

where Pk G is the projection matrix onto Sk, Qk = Ir^. — Pk ^ jg ^-pg 

projection onto the complement space S^, and Xk denotes the /c-mode product. Then, 
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the first condition in Q essentially states that Y XkQk does not depend on X, while the 
second condition in Q says x*. Qk does not affect the rest of the response, Y Xk Pk, 
and there is no information leak between Y Xk Qk and Y Xk Pk- As such, we call 
Y Xk Qk fhe immaterial information to the regression of Y on X, and call Y XkPk the 
material information. Combining the statements in ([^ for all /c = 1,..., m, we arrive 
at a parsimonious representation: 

Q(Y)|X ~ Q(r), Q(r) XP(1^)|X, 

where Q(l^) = G IR*-^x-x’'™, and P(r) = {Y; Pi,..., Pmj G IR''^x...xr-™^ ^ ^ ^ 

Tucker decomposition with Y as the core tensor, and Pi,, Pm as the factor matrices 
along each mode. These two operators provide a decomposition of 1^, = P(l^)+Q(l^), 

into the material part P(l^) and the immaterial part Q(l^). 

Introducing (|^ to the tensor response linear model ([^, we have the following results. 

Proposition 1. Under the tensor response linear model (|^, the assumption Q is true 
if and only if 


B Xk Qk 0 and Pk^^kPk T Qk^^kQk} h 1,..., m. 

To turn the above decompositions of B and into a basis representation, let G 
IR'''=^“'= denote a basis for Sk, where Uk is the dimension of Sk, and Foa: G ]R’"»:X(rfc-«fc) 
denote the complement basis, k = 1,... ,m. Let Qk G IR“'=x“t and fiofc € IR(^'=-“fe)x(''fc-“fc) 
denote two symmetric positive dehnite matrices. Then we have Pk = F^F^, Qk = 
FofcFj^, plus the following parameterization for B and S = (8) • • • ® Si. 

Proposition 2. The parameterization in Proposition is equivalent to the following 
coordinate representations: 

B = |©;Fi,...,F™,/p] /or some © G 1R“^^•■•^“™^^ 

Sfc = F^ + rofcLiQAjFg^, k = 1,..., m. 

Accordingly, one can rewrite the material response part P(l^) in the following way 

P(l") = |r; FiFl,..., F^F;,] = 111-; F^,..., F^,]; Fi,..., F J, 
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where the core tensor is Z = ; F^,. .., F^J G ]R“^ x-xum_ ggg that, by recognizing 

and focnsing on the material part of the tensor response P(l^), the regression modeling 
can now focus on the core tensor Z as the “surrogate response”, which plays a simi¬ 
lar role as F^l^ in (|^ in the vector response case. Meanwhile, the key parameter to 
estimate becomes © G along with {^k}'k=i {flok}'k=i- Con¬ 

sequently, the number for free parameters reduces from p nr=i Tk + Yjk=irk{rk + l)/2 
to P nr=i “fe + Y2=i{'^k{rk -Uk)+Uk{uk + l)/2 + (rfc - Uk){rk -Uk + l)/2}, and in effect 
saving p{nr.i>'/.-nr= -I'Wfc} parameters. With Uk usually being much smaller than r^, 
substantial dimension reduction is achieved, which in turn improves the estimation. 

3.3 Tensor envelope 

Similar to the vector case, we next develop the notion of tensor envelope for tensor 
response model ([^ to attain uniqueness of the subspaces Sk in the generalized sparsity 
principle Q. Unlike the vector case, however, there are two different ways to construct a 
tensor envelope. We will dehne the new concept in one way, then establish its equivalence 
with the other. Moreover, we will lay out the difference between the proposed tensor 
envelope and the vector envelope that is constructed based on the vectorized model ([^. 

One way to establish the tensor envelope for model ([^ is to recognize that it should 
contain span(.B^^^^^j), meanwhile it should reduce the covariance S and respect the 
separable Kronecker covariance structure that S = 8) • • • ® Si. Then following 

Dehnitions [T] and 1^ we come to the next dehnition of the tensor envelope. 

Definition 3. The tensor envelope for B in the tensor response linear model ([^, de¬ 
noted by is defined as the intersection of all reducing subspaces £ of H that 

contains span{B'^^_^_^.^) and can be written as £ = £m ® ^ £i, where £k C IR’"'', 

A; = 1,..., m. 

Following this dehnition, we see that Ty,{B) is minimum and unique, and is of central 
interest in our envelope based estimation of B. Moreover, under the special case that 
m = 1 and the response is a vector, the tensor envelope Ts{B) reduces to the envelope 
notion £s{B) for the vector response. 
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An alternative way to construct the tensor envelope is by noting that, due to the 
decomposition in Proposition]^ one can construct a series of envelopes, {Bik)), for 
k = 1,... ,m, that satisfy the generalized sparsity principle Q under model ([^. That 
is, {B{k)) is the smallest subspace Sk that contains span(S(fc)) and reduces 
fc = 1,..., m. Then one can construct a tensor envelope by combining (Ts*. (-B(fc)) 
to capture all the material information in the response. Naturally, the two ways of con¬ 
structing the tensor envelope are well connected, due to the next equivalence property. 

Proposition 3. The tensor envelope as defined in Definition^ satisfies that Ty,{B) = 

{B{ra)) 0 • • • <8) Tsi (^(1)) ■ 

Our estimation of the tensor envelope utilizes this result by hrst estimating a basis of the 
individual envelope {B(k )), then combining them by Kronekcer product to construct 
an estimate of the tensor envelope. 

Finally, we remark that, in principle, one can develop an envelope for the ordinary 
least squares estimator in model ([^ as well. By analogy to the envelope dehnition for 
a vector response, this envelope also contains span(.B^^^^^), and thus we denote it by 
^'s{B(^rn+i))- However, there are some important differences between £s{B(^rn+i)) and 
the tensor envelope T^{B) in Dehnition]^ First, Ts(.B(m+i)) does not take into account 
the separable covariance structure, nor can be decomposed into the Kronecker product 
of the individual envelopes [B(^k))- Second, the computation involved in estimating 
^^s(-B(m+i)) is prohibitive, as it replies on the estimation of the unstructured covariance 
matrix of vec(£) G lRnr=jBy contrast, the computation of Ts^B) is much 
more feasible, as we demonstrate in the next section. 

4 Estimation 

Our primary target of estimation is B in the tensor response linear model (|^. Our pro¬ 
posal is to estimate B through the tensor envelope Ts{B)^ which also involves estimation 
of S = Sm ® ® Si. The objective function to minimize is of the form, 

n 

t{B, E) = log |E| + n-^ {''“W) - sy+„A:.}"E-' {vec(V) - S(„+i)A:.} . 

i=l 
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Algorithm 1 Iterative optimization algorithm for minimizing i{B, S). 

[1] Initialize and ® ® S® 

repeat 

[2] Estimate envelope basis given B^^'^ and 

[3] Estimate parameters given 

[4] Update and ® ® 

until the objective function converges 


It is straightforward to verify that this objective function, aside from some constant, 
is the negative log-likelihood function of the model (|^ if one assumes that the error 
follows a normal distribution. By adopting Q then the parameter decompositions in 
Proposition the minimization of i{B, S) becomes estimation of the envelope basis 
Ffc G the reduced coefficient © G iR“ix--x“™xp^ and the matrices flk G IR^^xm^ 

and fiofc G ]RP'=““'=)x('';=-“*:)^ Here, with a slight abuse of notation, we 

continue to denote the dimension of the individual envelope {B(k)) as Uk- 

We present two solutions, one an iterative estimator and the other a one-step es¬ 
timator. The first solution alternates among steps of estimating one parameter given 
the rest fixed. It leads to a maximum likelihood estimator when the error follows a 
normal distribution and is a moment estimator otherwise. The second solution requires 
no iteration, and is essentially an approximate estimator, but it enjoys several appealing 
properties, both computationally and theoretically. 


4.1 Iterative estimator 


We first summarize our iterative optimization of i{B, S) in Algorithm We then give 
details for each individual step. Updating equations in each step are carefully derived 
as partial maximized likelihood estimators under the normal error assumption, with the 
detailed derivation given in the Supplementary Materials. As a result, the objective 
function is monotonically decreasing, guaranteeing the convergence of the algorithm. 

The first step of Algorithm [T] is to initialize B and S. For B, a natural initial es¬ 
timator is the OLS estimator .Bqls in 0- That is, we fit each element of the tensor 
response Y on X one at a time, and set the resulting estimator as the initial estima¬ 
tor B^^^ = Bqi^q. For S, we employ the covariance estimator of Dutilleul (1999) and 
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Manceur and Dutilleul (2013). That is, for k 


1,..., m, in turn, we set 






n 



(0) N-1 


(S^i) 



where ej(fc) is the mode-fc matricization of the residual, Ci = Yi — X(m+i) Xi,i = 
1,... ,n, for n sample replications, and the iterative update of each covariance S® given 
the rest starts with S® = j ^ k. One can verify that the above estimator is the 
maximum likelihood estimator if the error in (|^ follows a normal distribution. We also 
remark that, the individual component is not identihable. To address this issue, we 
normalize by dividing the term by its Frobenius norm, and update the hnal estima¬ 
tor = rS® ® ® where the scalar r = {nWj 

•••(8(sS°Vi}vec(e,). 

The second step of Algorithm is to estimate the envelope basis Update 

of given the current estimates B^^'^ and can be achieved by minimizing the 

following objective function, subject to the constraint that G\Gk = 


/<'>(G*) = log \GIM”G,\ + log \Gl(Ni 


it), 


rith-lG, 


( 8 ) 


where 


jf^k '^3/ 


-‘Er.i<5h,{(sW)-' 






hd'i-i, 


8 ® ••• ( 

W 'i-i, 




and is the htted envelope model residual were the envelope basis 


{r^}™ known: df = 1^-|Bols; Pt, , 


it) 


pit) T pit) 




where = r^^iT^j'^Y is the projection onto subspace span(r^''’) and is the mode-j 




cW 


matricization of <5f h Then we have = arg mincj, fY’ (Gk) subject to the orthogonal 

constraint G].Gk = lu^- This optimization is over all x Uk dimensional Grassmann 
manifolds since fY\Gk) = fk\GkO) for any orthogonal matrix O G lR“*^“fc. 

The third step of Algorithm [T] is to update © and and fiofc given the current 
estimate of the envelope basis {Ffc}^^. We hrst note that © can be estimated by 
regressing the core tensor, Z = |1^; r{,..., F^J G on the predictor X 

through ordinary least squares without any constraint. That is. 


it). 


©h+i) = zWx(,,+i){(xx")-'x}, 
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where Zf = |1^; ..., G ]R“^x-x“™, and g IR“^x...x«„xn 

array stacking to Zn'^. It is noteworthy that the dimension of the tensor response 
in this step is reduced from of Y to of Next we estimate flk, again 

using the iterative approach of [Dutilleul (1999) and Manceur and Dutilleul (2013). 

1 




(*+i) _ 


nil 


j¥=k i=i 


E 


,(i) 

i(k) 


-1 


(i+l)s _1 




mr 






(f + l)N _1 




where sf'^ = zf'^ — ©h+i) X(m+i) Xi is the residual from the regression of Z'^^^ on X, 
and is its mode-fc matricization. We remark that the above estimation of is 
parallel to the iterative updating of S® during the initialization. The only changes are 
to replace Cj with to replace with and to replace the starting of iteration 

Ir^. with Next we estimate fiofc using the formula, 


1 ^ 

= rn-^E(hr’rxw{(s£'r‘0...0(E<'>,)-' 

0(ew,)-‘ ®... 0 (sW)-‘} 

where G IR^^xCr^-ix^) jg orthogonal completion of G such that 

(fO+i)^ Fh+i)) jg orthogonal basis of IR’'*^. We also note that, unlike {Q,k}]^=n the esti¬ 
mation of requires no iteration, since it is only based on the current estimator 

SW and 

Finally, we update B through its parameterization B = |0; Fi,..., F^, IpJ in Propo¬ 
sition!^ We are able to obtain the explicit formulae for such an update, i.e.. 


B<'+»=Yx,p»+‘' X2... x„p^y> x,„+i, {(xxT'x} = [BoLs;h‘yk...R;y’.^pi. 


where X and Y are dehned in .Bqls in ((^- Then we update the covariance S as 


i(i+l) 


{t+1) /■p{i-l-l)\T 


= F^ (r 


r«and s(‘+‘i = e!;+‘' 




“Ok 


Ok 


4.2 One-step estimator 

Algorithm is iterative, and steps 2 to 4 are repeated until the objective function con¬ 
verges. Although our numerical experiences suggest that the estimate often does not 
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Algorithm 2 Moment-based algorithm for minimizing 
for s = 0,..., Wfc — 1 do 

Set Gl = 0 if s = 0 and Gl = {gki, • • •, Qks) otherwise 
Construct as an orthogonal basis complement to Gl in 1R'’'= 

Solve the objective function over w G IR*”"® subject to w^w = 1: 

Wk+i = argimnlog tuj + log u;|. 


Set Qk+i = G^i^Wk+i G 1R'’'= and normalize to unit length 

end for 


vary signihcantly after only a few iterations, the computations involved can still be in¬ 
tensive. In this iterative procedure, we recognize that the major computational expense 
arises from Step 2 that estimates the envelope basis by optimizing the ob¬ 

jective functions fk\Gk) in ([^ over the Grassmann manifolds. This is partly because 
for k = 1,..., m, depend on each others’ minimizers, so that the step of estimation 
of the envelope basis requires iterations within itself. Moreover, the Grassmann opti¬ 
mization is non-convex and possesses multiple local minima, and as such the algorithm 
usually requires multiple starting values. 

Next we propose an alternative estimator that adopts the same framework of Algo¬ 
rithm but replaces Step 2 with an approximate solution by introducing a modihed 
objective function than ([^. We then restrain the new estimator to be non-iterative, in 
that it goes through the steps of Algorithmic only once. Specihcally, the new objective 
function is of the form, 

/;“>(GO=log|GiS<‘>G,| + log|G;(ArP)-‘Gj|. (9) 

Comparing the two objective functions, the term JVfP in (|^ is replaced by in 

0, It is also interesting to note that, for the hrst round of iteration, the term 
would take the initial value Irj, and as such the term becomes the OLS residual 
e, = Y,- sB(o) X(^+i) X,. Gonsequently, = sf, and thus ff\Gk) = fi''\Gk). 

This new objective function (j^) has some appealing features. First of all, estimation 

*(t) 

of the envelope basis through does not depend on the values of other envelope 
basis Fj, j ^ k. Therefore, estimation of Fi to F^ becomes separable. This property 
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alone could increase computational efficiency substantially. Second, the optimization 
of can be achieved by the sequential algorithm recently proposed by 


Cook and 


Zhang (2014a), which is much faster and more stable than the Grassmann optimization. 


and the estimation result does not hinge on the initial guess. For completeness of the 
presentation, we summarize this sequential algorithm in Algorithm 

Thanks to both the modihed objective function as well as the non-iterative fashion of 
the optimization, the resulting one-step estimator is computationally much faster than 
the iterative estimator. Our numerical studies have found that it has a competitive hnite 
sample performance. Moreover, as we show in the next section, like its iterative coun¬ 
terpart, this one-step estimator remains a consistent estimator of the true parameters. 
Therefore, we recommend the one-step estimator in practice. 


5 Asymptotics 

In this section we study the asymptotic properties of the envelope based estimators as 
the sample size goes to inhnity. We investigate both the iterative estimator, denoted as 
and the one-step estimator, denoted as under two scenarios: the normality 

of the error distribution holds or does not hold. 

5.1 Consistency 

We hrst establish that, under fairly weak moment conditions, both the iterative estimator 
and the one-step estimator are y^-consistent, when the error term in the tensor response 
linear model (|^ does not necessarily follow a normal distribution. We note that the 
consistency is established in terms of the projection matrices, for the iterative 
estimator and for the one-step B°l,y, since a subspace can have inhnitely 

many semi-orthogonal basis but only one unique projection matrix. 

Theorem 1. Assuming vec(£j), i = 1,... ,n, in model ([^ are i.i.d. with finite fourth 
moments, then the projection and Pp® are both y/n-consistent estimators for the 
projection onto the envelope Ts*. (P(fc)). The corresponding estimators and B°%y 

both converge at rate-y/n to the true tensor coefficient Bt^ue- 
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5.2 Asymptotic normality 

We next establish the asymptotic normality of the iterative estimator when the 

error term vec(£) follows a normal distribution. Since only the iterative estimator is to be 
examined, we abbreviate its notation simply as Sbnv, and the corresponding projection 
as Prfc- Under this condition, this envelope based estimator becomes the maximum 
likelihood estimator (MLE). 

Theorem 2. Assuming vec(£j), i = in model ([^ are i.i.d. with a normal 

distribution, then the projection Pp* is the MLE for the projection onto the envelope 
(-S(fc)) ■ The corresponding estimator Benv is the MLE, and y/nyec{BENv—BTRUE) 
N{0,Uenv)- Moreover, the OLS estimator Bqls satisfies that y/nYec{BoLs — Btrue) 
A^(0, UoLs), and Uenv < U OLS- 

In addition to the established asymptotic normality, an important Ending of Theo¬ 
rem is that the asymptotic variance of the envelope estimator Pbnv is no greater than 
that of the OLS estimator Pqls- Therefore, Penv is asymptotically more efficient than 
Pols- One can conveniently obtain the asymptotic covariance of vec(PoLs); 

Pols = Sx 8) S = Sx^ 0 O • • • (g) Si, 

where Sx = cov(X). Next we give two additional results to gain more insight of 
Penv One assumes that the envelope basis is known a priori, and the other obtains the 
asymptotic variance of the estimator for both B and S jointly. 

We first assume the envelope basis is known, and denote the corresponding envelope 
estimator of B as Pp. We then compare its asymptotic variance with that of Pqls- 

Theorem 3. Under the same conditions as in Theorem Pp is ^/n-consistent and 
asymptotically normal. The asymptotic covariance o/vec(Pp) is 

Ur = Sx^ ® Pp^S^Pp^ ® • • • 0 PriSiPp, = Sx^ ® ® ® TiUhrl. 

Recall the decomposition S^ = -|- TokOokTlf,, /c = 1,...,m, in Proposition]^ 

Then it is straightforward to see that Pp < Pols, and the more dominating the imma¬ 
terial variation FofcLiofcrofc compared to the material variation the bigger the 
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difference is between Ur and C/oLs- This result agrees with the pattern we have observed 


and reviewed in Section [2^ for the vector response case, and shows the explicit gain of 
the envelope estimator in terms of estimation efficiency. 

We next compare the asymptotic covariance of the envelope estimator and the OLS 
estimator when the envelope basis is unknown. Intuitively, there is an extra term in 
the covariance of the envelope estimator as the cost of estimating the unknown envelope 
basis. Toward that end, we introduce the following notations. 


h = 


hi 

h.2 


vec(B) 

vech(S) 


0 = 


f \ 

02 

0m+l J 


( vec{B) ^ 
vech(Si) 

vech(Sm) / 




( \ 

\ ^3m+l / 


where the operator vech(-) : i—>■ stacks the unique entries of a symmetric 

matrix into a column vector, = vec(©), = {vec(rfc)}^=i, = 

{vech(f2fc)}^]^, and {$,j}^= 2 m +2 = {vech(f2ofc)}^i- It is interesting to note that the 
total number of free parameters is reduced from h, to 0 by YYk=Uk(\Xk=i'rk + l)/2 — 
+ l)/2 because of the imposed separable Kronecker covariance structure, 
and is further reduced from 0 to ^ by 'p(\Yk=i f'k — We also note that h 

is an estimable functions of 0 and respectively, and thus we write h = h{4>) = 
h{^). Let Jfi denote the Fisher information matrix for h, let H = dh{4>)/d4>, and 
K = dh{^)/d^. The explicit forms of Jh,H and K are given in the Supplementary 
Materials. Furthermore, let h.oLs denote the OLS estimator of h, h^Nv be the envelope 
estimator, and be the true parameter. Then, we have the following result. 


Theorem 4. Under the same conditions as in Theorem both Hqls and Henv are 
yjn-consistent and asymptotically normal, so that y/n{hoLs — hj-RUE) —^ N{0,Vols), 
where Vqls = and y/n{hEr,y - Htrue) -t N{0,Venv), where Venv = 

K{KUhKyK\ Moreover, 


r OLS ' 


Once again, the envelope estimator is asymptotically more efficient than the OLS es¬ 
timator. On the other hand, the envelope estimator of B and S are asymptotically 
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correlated. As such, there is no explicit form for the asymptotic covariance of 
except that it is the p(n x pdl upper-left block of T4 nv) when the envelope basis 
is unknown. This is different from the vector response case. 

In applications such as brain imaging analysis, it is often useful to produce a voxel- 
by-voxel p-value map, so one can visually identify subregions of brains that display 
distinctive patterns between disease and control groups. Given the results of Theorems]^ 
and 1^ we can produce such a p-value map for our envelope based estimator Benv Iu 
principle, one can obtain its asymptotic covariance C/env by extracting the upper-left 
block of Venv Iu practice, however, we suggest to substitute L^env with f/oLS! which 
is computationally much simpler, though it would lead to more conservative p-values. 
Once the p-values are obtained, one can further employ either simple thresholding or 
false discovery rate correction. 


6 Simulations 


In this section, we report simulations to compare the new estimator with two major 


competitors, the one-at-a-time OLS estimator (Section 6.1), and the tensor predictor 
regression of Zhou et ah ( 2013[ ) (Section |6.2[ ). It is noteworthy that, in the hrst compar¬ 
ison, the data was simulated from a model that conforms with the envelope structure, 
whereas in the second comparison, the model does not follow this structure. Therefore, 
the numerical experiment also demonstrates the performance of our estimator under 
model misspecihcation. Moreover, we investigate the effect of the envelope dimension 
and of the magnitude of immaterial variation on the coefficient estimation (Section |6.3[ ), 
and examine the case when the response is a three-way tensor (Section |6.4[). 


6.1 Comparison with OLS estimator 

We begin with a comparison with the alternative solution that dominates the literature, 
the OLS estimator, which hts one response component at a time. Specihcally, we consider 
the model of the form (|^, 

Yi = BXi + a£i, i = l,...,n. (10) 
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We set the sample size n = 20, a fairly small value, to mimick the common scenario of 
imaging studies with a small number of subjects. In this model, is a 64 x 64 matrix, 
Xi is a scalar that only takes two values, 0 or 1 , representing for instance the disease 
and control groups. B G represents the mean change of the response between 

the two groups. Elements of B are either 0 or 1, and is plotted in the hrst column of 
Figure [Tj We varied the shape of B among a square, a cross, a round disk and a bat 
shape. The constant a in front of the error term e was introduced to explicitly control the 
signal strength, and it took a value such that the signal-to-noise ratio (SNR) equals 0.01, 
0.1, and 1, respectively. The error e was generated from a matrix normal distribution, 
vec(£) ~ 1V(0, S 2 ( 8 )Si). To make the model conform to the generalized sparsity principle 
([^, we generated k = 1 , 2 , then normalized it by its 

Frobenius norm. We set the working envelope dimension ui = U 2 equal to the numerical 
dimension of the true coefficient matrix B, which is 1 for the square signal, 2 for the cross, 
8 for the disk, and 14 for the bat-shape. We eigen-decomposed the coefficient matrix as 
B = GDG^. Then we generated F^ = GOk, for an orthogonal matrix Ok G 
whose elements were from a standard uniform distribution. This way, it is guaranteed 
that span(.B) C span(Fi) and span(.B’^) C span(F 2 ). We then orthogonalized F^ and 
obtained Fofc. The covariances fifc G IR"*^"*^ and fiofc £ were generated 

of the form AA^, where A is a square matrix with matching dimension and all its 
elements were from a standard uniform distribution. 

Figure summarizes the estimated coefficient matrix B, under varying image shapes 
and signal strengths. It is clearly seen that the envelope estimator B^nv substantially 
outperforms the one-at-a-time OLS estimator .Bqls- For instance, when the signal is 
weak (SNR is 0.01 or 0.1), the OLS estimator failed to identify any meaningful signal, 
whereas the envelope estimator produced a sound recovery. Moreover, we emphasize 
that such a performance is achieved under a very small sample size (n = 20 ). 


6.2 Comparison with tensor predictor regression 


Next we compare our method with a recent proposal of tensor regression (Zhou et al. 


2013) that studies association between a scalar response and a tensor predictor. Even 
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Figure 1: Comparison with OLS: The true and estimated regression coefficient tensors 
under various signal shapes and signal-to-noise ratios (SNR). 


though both methods are motivated from neuroimaging analysis, and both involve a 
tensor variable in a regression analysis, the two clearly differ in the role the tensor plays in 
regression and the corresponding interpretation. Moreover, the techniques involved differ 
signihcantly too, with our approach utilizing the generalized sparsity principle, whereas 


Zhou et ah (2013) employed a reduced-rank decomposition, the canonical decomposition 


or parallel factors (CP decomposition), of the coefficient tensor. 
We consider the model of [Zhou et ah (2013), 


Yi = {B,Xi) +ei, i = 


( 11 ) 


where is a scalar response, X G jg image whose elements were all drawn 

from a standard normal distribution, and the error e is standard normal and independent 
of X. The coefficient matrix B G was set in the same way as in Section 

{B,X,) = (vec(S) , vec(X)) is the tensor inner product. We examined three sample 
sizes, n = 300,900 and 2700, respectively. 


6.1 
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Figure 2: Comparison with tensor predictor regression: The true and estimated regres¬ 
sion coefficient tensors under various signal shapes and sample sizes. 


Figure [^summarizes the results. It is interesting to observe that the enveloped based 
estimator outperforms the CP estimator of Zhou et ah (2013) when the sample size is 
small {n = 300) to moderate (n = 900), and underperforms only when the sample size 
is fairly large (n = 2700) but still produces a reasonable signal recovery. It is also 
noteworthy that, in this example, the data was generated from model ( [IT| , based upon 
which that the CP estimator was built. As a result, it actually favors the CP estimator. 
On the other hand, the generalized sparsity principle (j^ is not guaranteed in this setup. 
Therefore this comparison shows the promise of our envelope estimator even when the 
assumed envelope structure does not hold in the data. 


6.3 Envelope dimension and immaterial information 

We next investigate two issues: the effect of the working envelope dimension, 
which are the tuning parameters of our method, and the effect of magnitude of the 
immaterial information on our proposed envelope based estimation. 
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(a) Immaterial-to-material variation ratio cTq = 1 
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Figure 3: Effect of working envelope dimension and the strength of immaterial informa¬ 
tion on the envelope based estimator. 


For the hrst problem, we generated n = 100 i.i.d. samples from model (10) with the 
bat-shape signal, which is a natural shape and is relatively more complicated than the 
geometric shapes. We then varied the working envelope dimension ui = U 2 = u, with 
u = {5,10,15, 20, 25, 35, 64}, while the numerical rank of the bat-shape signal equals 14 
in this example. We also note that, if one sets the working envelope dimension Uk the 
same as the dimension of the tensor response r^, then the envelope estimator degenerates 
to the OLS estimator. Figure [^a) shows one snapshot of the results. We hrst see 
that, all the envelope estimators {u < 64) outperformed the OLS estimator {u = 64), 


reinforcing the pattern observed in Section 6.1 When the working envelope dimension is 
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smaller than the true signal dimension {u < 14 in this example), the envelope estimator 
produced reasonable but mediocre recovery. When the working dimension exceeds the 
truth {u > 14), the envelope estimator produced much refined recovery. Meanwhile, as 
the working dimension increases, there is a sign of overfitting, but the quality of the 
recovered signal remains competitive. 


For the second problem, we continued to employ model (10), but introduced an 
additional scalar cxo in the covariance + (jg Fofcriofcrgfc, where ao controls 

the magnitude of the immaterial information. Figure [^b) shows one snapshot of the 
results when Uq = 1000, whereas Figure a) had Ug = 1. Comparing the two figures, 
we first verify that, the more dominant of the immaterial information (i.e., the larger 
value of (To), the better performance of the envelope estimator. On the other hand, the 
OLS estimator continued to fail to identify any meaningful signal. This demonstrates 
the importance of recognizing the immaterial information to improve the estimation. 

6.4 Three-way tensor 

In the above simulations, we have primarily focused on the matrix-valued response 
(order-2 tensor) and a single predictor, since it enables a direct visualization of the 
coefficient estimator. In this section, we considered a tensor response model where the 
response becomes an order-3 tensor with dimensions (ri,r 2 ,r 3 ) = (20,30,40), and the 
predictor is a 5-dimensional vector. The rest of the setup was similar to that in Sec¬ 


tion 6.1 We simulated data with different envelope dimensions: (ui,U 2 ,U 3 ) = (2,3,4), 




Hr rii 2 

||-OenV -^11 

Average 

S.E. 

Average 

S.E. 

{ui,U2,U3) = (2,3,4) 

n = 100 

127 

0.07 

4.17 

0.04 

n = 400 

29.0 

0.03 

0.81 

0.01 

{ui,U2,U3) = (5,5,5) 

n = 100 

133 

0.03 

3.57 

0.01 

n = 400 

32.2 

0.05 

0.69 

0.01 

{ui,U2,U3) = (10, 10, 10) 

n = 100 

213 

3.68 

4.08 

0.40 

n = 400 

51.8 

1.98 

0.89 

0.20 


Table 1: Average and standard error (in parenthesis) of the estimation accuracy mea¬ 
sured by \\B — based on 100 data replications. 
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(5, 5, 5) or (10,10,10), and different sample sizes: n = 100 or 400. For each combination, 
we simnlated 100 data replications, and report the average and the standard error of 
II-SqlS ~ ip and II^ENv ~ -S|P ill Table Again, the envelope based estimator showed 
a dramatic improvement over the OLS estimator in terms of estimation accnracy. 


7 Real Data Analysis 

7.1 EEG data analysis 


We hrst analyzed an electroencephalography (EEG) data for an alcoholism stndy. The 
data was obtained from https : //archive. ics .uci . edu/ml/datasets/EEG+Database, 
It contains 77 alcoholic individnals and 44 controls. Each individnal was measnred with 
64 electrodes placed on the scalp sampled at 256 Hz for one second, resnlting an EEG 
image of 64 channels by 256 time points. More information abont data collection and 


some analysis can be found in Zhang et ah (1995) and Li et ah (2010). To facilitate the 
analysis, we downsized the data along the time domain by averaging four consecutive 
time points, yielding a 64 x 64 matrix response. We report both the OLS estimator and 
the envelope based estimator in Figure where the upper panels show the coefficient 
estimator and the lower panels show the corresponding p-value maps, thresholded at 
0.05. It is interesting to observe that, the envelope estimator identihes the channels 
between about 15 to 30, and between 45 to 60, at time range from 30 to 120, and from 
about 200 to 240, are mostly relevant to distinguish the alcoholic group from the control. 
By contrast, the OLS estimator is much more variable, and the revealed signal regions 
are much less clear. 


7.2 ADHD data analysis 

We next analyzed a magnetic resonance imaging (MRI) data for a study of atten¬ 
tion dehcit hyperactivity disorder (ADHD). The data was produced by the ADHD- 
200 Sample Initiative, then preprocessed by the Neuro Bereau and made available 
at http://neurobureau.projects.nitrc.org/ADHD200/Data.html. It consists of 776 
subjects, among whom 285 are combined ADHD subjects and 491 are normal controls. 
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Figure 4: EEG data analysis: top panels are estimated regression coefficients; bottom 
panels are the p-value maps, where the black regions correspond to the p-value less than 
the threshold 0.05. 


We removed 47 subjects due to the missing observations or poor image quality, then 
downsized the MRI images from 256 x 198 x 256 to 30 x 36 x 30, which is to serve as our 
3-way tensor response. The predictors include the group indicator (1 for ADHD and 0 
for control), the subject’s age and gender. Figure [^summarizes the Endings. While the 
OLS estimator reveals essentially no useful information, the envelope estimator shows 
clearly two regions that reflect distinctive activity pattern between the ADHD and con¬ 
trol subjects. One region corresponds to the cuneus and the other to the fusiform gyrus. 


and such Endings are consistent with the literature (Booth et ah, 2005 Solanto et ah 


2009 Wolf et ah, 2009; Yu-Feng et al., 2007 [rfan et ah 2008). 





































Figure 5: ADHD analysis: top panels are estimated regression coefficients; bottom 
panels are the p-value maps, where the red regions correspond to the p-value less than 
the threshold 0.05. The left four plots show the OLS estimation, and the right four the 
envelope estimation. For each estimator, two angles of views are shown. 

8 Discussion 

In this article, we have proposed a parsimonious model for regression with a tensor re¬ 
sponse and a vector of predictors. Adopting a generalized sparsity principle, we have 
developed an envelope based estimator that can identify and focus on the material infor¬ 
mation of the tensor response. By doing so, the number of free parameters is effectively 
reduced, and the resulting estimator is asymptotically efficient. Both simulations and 
real data analysis have demonstrated effectiveness of the new estimator. 

We make some remarks about practical use of our proposed method. First, we suggest 
to combine the coefficient map and the p-value map in practice to help identify relevant 
signal regions. We have observed that the p-value map using the OLS asymptotic covari¬ 
ance can be conservative especially when the true signal is weak, whereas the coefficient 
map can often provide a useful recovery. On the other hand, the coefficient map may 
include many small signal regions, while the p-value map is usually more clean. Second, 
the working envelope dimension Uk is the main tuning parameter in our proposal. In 
principle, if the selected working dimension is smaller than the truth, the corresponding 
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envelope estimator is biased, whereas if the selected dimension is larger than the truth, 
the resulting estimator is unbiased but can be more variable. The selection of envelope 
dimension reflects a bias and variance trade-off. 

A core idea of our proposal is to recognize and focus the estimation based upon the 
relevant information in the tensor response. Sparsity is dehned in a general sense and 
is achieved through the envelope method. This is different from the common strategy 
in sparse modeling that induces sparsity through penalty functions. On the other hand, 
our envelope based estimation can be naturally coupled with penalty functions to attain 
further regularization. This line of work is currently under investigation and consists of 
our future research. 
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